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Abstract — We consider the problem of recovering a 
low-rank matrix M from a small number of random 
linear measurements. A popular and useful example of this 
problem is matrix completion, in which the measurements 
reveal the values of a subset of the entries, and we wish 
to fill in the missing entries (this is the famous Netflix 
problem). When M is believed to have low rank, one would 
ideally try to recover M by finding the minimum-rank 
matrix that is consistent with the data; this is, however, 
problematic since this is a nonconvex problem that is, 
generally, intractable. 

Nuclear-norm minimization has been proposed as a 
tractable approach, and past papers have delved into 
the theoretical properties of nuclear-norm minimization 
algorithms, establishing conditions under which minimizing 
the nuclear norm yields the minimum rank solution. We 
review this spring of emerging literature and extend and 
refine previous theoretical results. Our focus is on providing 
error bounds when M is well approximated by a low-rank 
matrix, and when the measurements are corrupted with 
noise. We show that for a certain class of random linear 
measurements, nuclear-norm minimization provides stable 
recovery from a number of samples nearly at the theoretical 
lower limit, and enjoys order-optimal error bounds (with 
high probability). 

I. Introduction 

Low-rank matrix recovery is a quickly developing 
research area with a growing list of applications such as 
collaborative filtering, machine learning, control, remote 
sensing, computer vision, and quantum state tomog- 
raphy. In its most general (noiseless) form the prob- 
lem consists of recovering a low-rank matrix, M £ 
M" lX " J , from a series of m linear measurements, 
( Ai , M) , (A2 , M), (A m , M) (we use the usual inner 
product (X,Y) = Tr(X*Y) = £\ j A',,, >',,,). The A 2 's 
are known and are analogous to the rows of a compressed 
sensing matrix. To consolidate the presentation, we write 
the linear model more compactly as A(M) for the linear 
operator A : R niXn2 -> R m (the zth entry of A{X) is 
(Ai,X)). 

If computational time were not an issue, one would 



ideally reconstruct M by solving 

minimize rank(X) 

subject to A(X) = A(M), ( ' 

where X £ jj™ix™2 j s ^e decision variable. Unfor- 
tunately, rank minimization is an intractable problem 
(aside from a few rare special cases) and is in fact 
provably NP-hard and hard to approximate [8], [14]. To 
overcome this problem, nuclear-norm minimization has 
been introduced as the tightest convex relaxation of rank 
minimization [4], [6], [9], [10], [15]. Here, one solves 
instead, 

minimize ||X||* 

subject to A(X) = A(M). ' 

Due to its convexity, the nuclear-norm minimization 
problem is tractable (and an SDP) and a number of fast 
algorithms have been proposed to solve it [1], [13]. 

A recent influx of papers has shown that for a broad 
range of low-rank matrix recovery problems, nuclear- 
norm minimization correctly recovers the original low- 
rank matrix [4], [6], [15], [16]. Most of these papers 
have focused on the matrix completion subproblem (see 
Section ITTTb in which the measurements are simply 
entries of the unknown matrix. A main purpose of 
this paper is to compare the theoretical results in the 
matrix completion problem to those possible with 'less 
coherent' measurement ensembles. 

A. Organization of the paper 

In the first half of the paper (Section [II]), we present 
new theoretical results concerning low-rank matrix re- 
covery from measurements obeying a certain restricted 
isometry property, thereby extending and refining the 
work of Recht et al. in [15]. A first important question 
we address here (and in the matrix completion sub- 
problem) is this: how many measurements are necessary 
to recover a low-rank matrix? By taking the singular 
value decomposition of M £ R™i xn 2 w j t h rank M 
r . one can see that M has (rii + «2 — r)r degrees 



of freedom. This can be much lower than ri\n% for 
r <C min(ni,7i2) suggesting that one may be able to 
recover a low-rank matrix from substantially fewer than 
7ii7i2 measurements. In fact, it has been shown [15] 
that one may oversample the degrees of freedom by a 
logarithmic factor and still exactly recover M via nuclear 
minimization with high probability. In this paper, we 
show that for certain classes of linear measurements, 
one can reduce the number of measurements to a small 
multiple of (nj + n 2 — r)r, and still attain exact matrix 
recovery via nuclear-norm minimization. Further, when 
the measurements are corrupted by noise, we suggest 
a nuclear norm based algorithm that takes into account 
the noise in the model and show that the error when 
using this algorithm is order optimal. Lastly, when M 
has decaying singular values, the error bounds are refined 
and extended to exhibit an optimal bias-variance trade- 
off (explained in more detail in Section HH). 

In the second half of the paper (SectionHIIl). we review 
the theory on matrix completion, noting that this is a 
much different problem because the RIP does not hold. 
We begin the section by comparing different theoretical 
results regarding nuclear norm minimization. We also 
remark that other competing algorithms have arisen 
to tackle low-rank matrix completion. To the authors' 
best knowledge, only one such alternative algorithm, 
proposed by Montanari et al. [11], [12], has rigorous 
theoretical backing. We review the theory proposed by 
these authors and highlight some of the differences 
between their approach and nuclear-norm minimization. 
We conclude this section by reviewing the noisy matrix 
completion results, and comparing them to the results 
when the RIP holds. 

B. Notation 

In the remainder of the paper, we assume M is square, 
with n\ = ?i2 = n, in order to simplify the notation. 
Simple generalizations of our results, however, hold for 
rectangular matrices. Below, ||X|| refers to the operator 
norm of X (the largest singular value), ||X||i i00 is the 
magnitude of the largest entry of X 

||^||i,oo = max\Xij\, 

and is the Frobenius norm. The standard basis 

vectors are denoted by e^, and A* is the adjoint of the 
operator A, A : R" x " — > R m , so that 

771 

[A(X)]i = (Ai,X) & A*{v) =J2 v i A i- 

i=l 



The singular value decomposition of M (with 
rank(A/) = r) is written as 

r 

M = ^T a lUl v* = , (1.3) 

i=l 

with U, V E M nxr ,I] e W xr for orthogonal matrices 
U, V and the diagonal matrix of singular values, S. 

II. Random linear measurements 

A difficulty in the matrix completion problem is that 
unless all of the entries of the unknown matrix are 
sampled, there is always a rank-1 matrix in the null space 
of the sampling operator (see Section HITb. This leads to 
the necessity of requirements below on the flatness of 
the singular vectors of the underlying unknown matrix. 
Interestingly, such assumptions are not necessary when 
considering other classes of measurement ensembles. In 
a paper bridging the gap between compressive sensing 
and low-rank matrix recovery [15], the authors prove 
that many random measurement ensembles often satisfy 
a restricted isometry property (RIP), which guarantees 
that low-rank matrices cannot lie in the null space of A 
(or cannot lie 'close' to the null space of .4). 

Definition 1: For each integer r = 1,2, ... ,n, define 
the isometry constant S r of A as the smallest quantity 
such that 

(1 - S r )\\X\\% < \\A(X)f F < (1 + 8 r )\\X\\ 2 F (II.l) 

holds for all matrices of rank at most r. 
A measurement ensemble, A, is said to obey the RIP at 
rank r if 5 r < S < 1 for a constant S whose appropriate 
values will be specified in what follows. 

How many measurements, m, are necessary to ensure 
that the RIP holds at a given rank r? To first achieve a 
lower bound on this quantity, note that the set of rank r 
matrices contains the set of matrices which are restricted 
to have nonzero entries only in the first r rows. This is 
annxr dimensional vector space and thus we must 
have m> nr or otherwise there will be a rank-r matrix 
in the null space of A regardless of what measurements 
are used. The following theorem shows that for certain 
classes of random measurements, this lower bound can 
be achieved to within a constant factor. 

Theorem 2: Fix < 5 < 1 and let A be a random 
measurement ensemble obeying the following property: 
for any given X € M. nxn and any fixed < t < 1, 

P(\\\AX\\l - \\X\\%\ > t\\Xf F ) < Ccxp(-cm) 

(II.2) 

for fixed constants C,c > 0. If m > Dnr then A 
satisfies the RIP with isometry constant 8 r < 5 with 
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probability exceeding 1 — Ee for fixed constants 

D 1 E,d> 0. 

As an example of a generic measurement ensemble obey- 
ing dll.21 l. if each At contains iid mean zero Gaussian 
entries with variance 1/m then m ■ \\A(X)\\f 2 /\\X\\ 2 F 
is distributed as a chi-squared random variable with m 
degrees of freedom. Thus, applying a standard concen- 
tration bound, 

P{\\\AX\\l-\\X\\l\>t\\X\\l)<2e-^-^ 

and ( III. 2b is satisfied. Similarly, each Ai can be com- 
posed of iid sub-gaussian random variables to achieve 
the concentration bound ( 1H.2I ). Thus one way to interpret 
Theorem [2] is that 'most' properly normalized measure- 
ment ensembles satisfy the RIP nearly as soon as is 
theoretically possible, where the measure used to define 
'most' is Gaussian (or sub-Gaussian). 

Theorem [2] is inspired by a similar theorem in 
[15] [Theorem 4.2] and refines this result in two ways. 
First, it shows that one must only oversample the number 
of degrees of freedom of a rank r matrix by a constant 
factor in order to obtain the RIP at rank r (which 
improves on the theoretical result in [15] by a factor 
of logn). Second, it shows that one must only require 
a single concentration bound on A, removing another 
assumption required in [15]. 

A. Minimax Error Bound 

Using the RIP, Recht et. al. [15] show that exact 
recovery of M occurs when solving the convex problem 
dl.2| ) provided that rank(il/) = r and 5 5r < 8 for a certain 
constant S « .2. We extend this result by considering the 
noisy problem, 

y = A(M) + z, (113) 

where for simplicity the noise, z, is assumed to be 
Gaussian with iid mean zero entries of variance a 2 . 

In this case, we analyze the performance of a version 
of dl.2t which takes noise into account, and is analogous 
to the Dantzig Selector algorithm [5]: 

minimize | X 1 1 * 

subject to < A (II.4) 

r = y-A(X), 

where A = C^/na for an appropriate constant C. A 
heuristic intuition for this choice of A is as follows: 
suppose that A is simply the operator which stacks the 
columns of its argument into a vector, so that A* A is 
the identity operator, and A*(z) is an n x n matrix with 
iid Gaussian entries. This is perhaps the simplest case 
to analyze. We would like the unknown matrix M to 
be a feasible point, which requires that ||yl*(z)|| < A. 



It is well known that the top singular value of a square 
n x n Gaussian matrix, with per-entry variance a 2 , is 
concentrated around \/2na, and thus we require A > 
\/2na. Further, observe that in this simple setting the 
solution to dll.4t can be explicitly calculated, and is 
equal to T\(M + A*{z)) where the operator T\ soft- 
thresholds the singular values of its argument by A. If 
A is too large, then T\(M + A*{z)) becomes strongly 
biased towards zero, and thus (loosely) A should be as 
small as possible while still allowing M to be feasible, 
leading to the choice A f=a \[2na for this simple case. 

We are now prepared to present the simplest version 
of our theoretical error bounds. The following theorem 
states that if M has low rank then the error is order 
optimal with overwhelming probability. 

Theorem 3: Suppose that A has RIP constant 5± r < 
V2 - 1 and rank(A/)= r. Let M be the solution to dH4t . 
Then 

\\M - M\\ 2 F < Cnra 2 {11.5) 

with probability at least 1 — De~ dn for fixed numerical 
constants C,D,d > 0. 

The result in this theorem is quite similar to the adaptive 
error bound in compressive sensing first proved in [5] 
and the proofs are almost identical (see [2] for a proof). 
In order to see how the result generalizes when M is 
rectangular, in the case when M £ R™ 1 *" 2 , the error 
bound (III. 5b is replaced by 

\\M - M||| < Cmax(ni,n 2 ) ra . 

We compare the above error bound dll.5t , to the 
minimax error bound described below, 

Theorem 4: Any estimator M(y), with y = A(M) + 
z, obeys 

sup E||M- Af|| 2 > 1 nra 2 . (11.6) 

A/:rank(M)<r 1 + Or 

In other words, the minimax error over the class of 
matrices of rank at most r is lower bounded by just 
about nra 2 . 

Thus the error achieved by solving a convex program is 
within a constant of the expected minimax error (with 
high probability). As an exercise, and to help further 
understand the error bound dll.5l ), we analyze the error in 
the example above in which .4* .4 is the identity operator 
and M = T X (M + A*{z)). In this case, letting M = 
M + A*(z), 

||M-M|| = \\T X {M)-M + A*(z)\\ 

< ||r A (M)-A7|| + p*(z)|| 

< 2A 
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assuming that A > j|.4*(z)||. Then, 

\\M - Mf F < \\M - M\\ 2 rank(M - M) 
< 4A 2 rank(M-M). 



(II.7) 



Once again, assuming that A > j|^4*(z)j|, we have 
rank(M - M) < rank(Af) + rank(M) < 2r. Plugging 
this in with A = C^/no gives the error bound dll.5t . 

B. Oracle Error Bound 

While achieving the minimax error is useful, in many 
cases minimax analysis is overly focused on worst- 
case-scenarios and more adaptive error bounds can be 
reached. This is exactly the case when M has decaying 
singular values, with many singular values below the 
'noise level' of y/no. In order to set the bar for error 
bounds in this case, we compare to the error achievable 
with the aid of an oracle. 

To develop an oracle bound, consider the family of 
estimators defined as follows: for each nxr, orthogonal, 
matrix U, define M[U] as the minimizer to dll.8t 

min{||y - A(M)\\e 2 ■ M = UR for some R). (II.8) 

In other words, we fix the column space (the linear space 
spanned by the columns of the matrix U), and then find 
the matrix with that column space which best fits the 
data. Knowing the true matrix M, an oracle or a genie 
would then select the best column space to use as to 
minimize the mean-squared error (MSE) 



infE 

u 



\M-M[U]\\ 



(H.9) 



The question is whether it is possible to mimic the 
performance of the oracle and achieve a MSE close to 
flH.9t with a real estimator. 

Through classical calculations, one may lower bound 
\\M— M[U] || 2 (the steps required will be shown in detail 
in the sequel) as follows: we have 



E||M-M[C/]||! > 



\\P U± (M)\\% + 



1 + S r 



where P V ±{M) = (I - UU*)M. The first term is a 
bound on the bias of the estimator which occurs when U 
does not span the column space of M while the second 
term is a bound on the variance which grows as the 
dimension of U grows. Thus the oracle error is lower 
bounded by 



infEIIM- MfUllli > inf 
u ~ u 



\Pu±m\ 



Now for a given dimension r, the best U — that mini- 
mizing the proxy for the bias term ||P[/j- (M)!!^ — spans 



the top r singular vectors of the matrix M and thus we 
obtain 



infE II M 

u 



M[U]\\ 2 F > inf 



.i>r 



o 2 {M) 



1 



which for convenience we simplify to 

infE||M-M[U"]||| > i ^ mhi(cr 2 , no 1 ). (11.10) 

i 

The right-hand side has a nice interpretation. If of > 
no 2 , one should try to estimate the rank-1 contribution 
OiUiV* and pay the variance term (which is about no 2 ) 
whereas if of < no 2 , we should not try to estimate this 
component, and pay a squared bias term equal to o 2 . 
In other words, the right-hand side may be interpreted 
as an ideal bias-variance trade-off, which can be nearly 
achieved with the help of an oracle. 

The following theorem states that when M has low 
rank, one achieves the optimal bias-variance trade-off 
when solving a convex optimization problem, up to a 
constant factor. 

Theorem 5: Suppose that A has RIP constant 8^ < 
\/2- 1 and rank(M)= r. Let M be the solution to (111.4-b . 
Then 



\M - M\\i < 



De 



minfaf , no 2 ) 



-dm 



for some numerical 



with probability at least 1 
constants C,D,d > 0. 
For a proof, see the upcoming paper [2]. 

C. Approximately low-rank, noisy, error bounds 

An important drawback of the above two theorems 
(Theorems [5] E) is that they only apply when M is 
exactly a low-rank matrix, but do not generally apply 
when M is well approximated by a low-rank matrix. 
However, for many random measurement ensembles A, 
the above result can be extended to handle the case when 
all n of the singular values of M are nonzero. This is 
the content of the following theorem. 

Theorem 6: Fix M. Suppose that each 'row' A; of A 
contains iid mean zero Gaussian entries with variance 
1/m. Suppose m < cn 2 / log n for some numerical 
constant c. Let f be the largest integer such that 8^ < 
\{s/2 - 1). Let M be the solution to dlL4l Then 

(r n \ 

5>in(a 2 ,na 2 )+ £ o 2 \ 
i=l i=f+l ) 

cn.il) 

with probability greater than 1 — De~ dn for fixed nu- 
merical constants C,D,d > 0. 
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Here, f is the largest value of r such that the RIP 
holds and thus f > c— with high probability for a 
fixed numerical constant c (see Theorem[2]). The constant 
i in 84 r < ^(V2 — 1) is arbitrary and could be 
replaced by any constant less than 1. The error bound 
has an interesting intuitive interpretation: decompose M 
asM = Mr + M r with 



Mr 



i=l 



Mr 



so that Mr is the projection of M onto rank-f matrices. 
Then we achieve the near optimal bias-variance trade-off 
in estimating M f , but cannot recover M c . 

An important point about Theorem [6] is that it is an 
example of instance optimality: the result holds with high 
probability for any given specific M, but it does not hold 
uniformly over all M, For the proof, see [2]. 

III. Matrix completion 

A highly applicable subset of low-rank matrix re- 
covery problems concerns the recovery of an unknown 
matrix from a subset of its entries (matrix completion). 
An example to bear in mind is the Netflix problem 
in which one sees a few movie ratings for each user, 
which can be viewed as a row of (possible) ratings 
with only a few entries filled in. Stacking the rows 
together, creates the data matrix. Netflix would like to 
guess how each user would rate a movie he had not 
seen, in order to target advertising. A great difficulty is 
that there are always rank-1 matrices in the null space 
of the measurement operator and, thus, our problem is 
'RIPless'. 

In order to specialize the nuclear-norm minimization 
algorithm ( 11.21 to matrix completion, let SI be the set of 
observed entries. We assume £7 is chosen uniformly at 
random with |fi| = m (this turns the discussion away 
from adversarial sampling sets). Define Pq : R" xn — > 
l" x ™ to be the operator setting to zero each unobserved 
entry, 



[Pn(X)]i 



Then one solves 



if (i,j)6fi 

if (i,j)<£n. 



(in.i) 



(III.2) 



minimize ||^||* 
subject to P n {X) =P n (M). 

To the best of our knowledge, there are five papers 
with novel theoretical guarantees on noiseless matrix 
completion [4], [6], [11], [12], [16]. We compare the 
results of this prior literature in Table U The parameters 
yU, fi2, I^b, K m Table Q] are defined further on in 
this section, but for now note that they depend on the 



Assumptions 
on M 


Number of measurements 
m required 


Paper/ 
Theorem 


M is 
generic * 


Cn 5/4 r log(n) 
or 

Cn 6/5 rlog(n) if r < n 1/5 


[4], 
Thm 1.1 




s~i i 2 1/2 l/4\ i 

O niax(|ij , (1q (Ji\ , /j,qti jtlt log n 
or 

Cfj, n G / 5 rlog(n) if r < /.i~ 1 n 1/5 


[4], 
Thm 1.3 


M is 
generic * 


Cnr log n 
or 

Tt if T" ^ log 7'i 

Cnr log 6 n if r - O(l) 


[6], 
Cor. 1.6 


r — o^ij 


£j ^Ai j^Yh log Tt 


Cor. 1.5 


none 


Cfi 2 nr log d n 


[6]. 
Thm 1.2 


M is 
generic *, 

r < an ** 


max(c2n 1J , mo) ** 


[16], 
Thm 2.5 


none 


CnK A max(p|)f log n : fi^r^n 2, , fi^r^ k, 4 ) 


[11], 
Thm 1.2 



TABLE I: Comparison of different theoretical guaran- 
tees for matrix completion. When the requirements on 
M and the number of measurements are met, and the 
measurements are chosen uniformly at random, then exact 
matrix completion is guaranteed with probability at least 
1 — cn -3 (for a fixed constant c). C is also a fixed 
constant. The algorithm used to produce the results in 
the last line is OPTSPACE, the rest of the table refers to 
nuclear-norm minimization dIII.2b . 

* M is drawn from the random orthogonal model which 
is defined below. Intuitively, under this model the singular 
vectors of M have no structure and are thus 'generic'. 
** The constants ci and C2 satisfy ci, C2 < 1 and mo is 
a fixed integer. 



structure of the underlying matrix, M, and in many 
cases are small (e.g. 0(1) or O(logn)) under differing 
assumptions on M. 

A. nuclear-norm minimization algorithms 

We first review the results of [4], which pioneered the 
matrix completion theory. As described therein, assump- 
tions on M are vital to ensure that matrix completion 
is possible. To compel this line of reasoning, suppose 
M = e^e* is a (rank-1) matrix with only 1 nonzero entry. 
If this entry is not seen, then M is in the null space 
of the measurement operator and is indistinguishable 
from the zero matrix. Such observations are explored 
in more depth in [4], [6], [7] providing an argument for 
the necessity of the assumption that the singular vectors 
of M are 'spread', which is also intrinstically important 
to bounding the size of fis, Mo 7 Mi, M2 and /i (but has no 
relation to k). 

In order to quantify 'spread', with parameter hb, the 
authors of [4] require 



(1113) 
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for each Uk,Vk (recall these are the singular vectors of 
M). Note that the minimum value of /i# is 1 if all of 
the singular vectors have minimal norm, and that hb 
can be as large as n when a singular vector has only one 
nonzero entry. When r = 0(1), the constants fio, pi and 
p are all 0(1) • [is (see [4], [6]), thus bounding all of 
the parameters involved in the nuclear norm theoretical 
results. 

In order to prove theoretical guarantees for larger 
values of the rank, [4] introduces the concept of the 
incoherence of M with parameters po and p\ as defined 
below. Let Pjj = UU* be the projection onto the 
range of the left singular vectors of M and similarly 
let P v = VV*. Then [4] requires, 



max P[/e, 

Ki<n 



, max 

Ki<n 



\Pvei\\e 2 



< 



Mo, 



\UV*\\,^ < 2- 



< —Mi- 
tt, 

A matrix M is said to be incoherent if /io and \x\ are 
small (e.g. 0(1) or 0(log7T,)...). Note that these param- 
eters, and thus the number of measurements required in 
Theorem 1.3 of [4] have no dependence on the singular 
values of M, a quality that is ubiquitous to all of the 
parameters involved in the nuclear-norm minimization 
theory. 

Which matrices are incoherent? As noted above, if 
r = 0(1) then /io, Mi < 0(1) ■ Ms and mus me matrices 
with 'spread' singular vectors are incoherent. To address 
this question from another angle, introduce the random 
orthogonal model mentioned in Table Q] 

Definition 7: A matrix M = UYjV* of rank r is said 
to be drawn from the random orthogonal model if U 
is drawn uniformly at random from the set of n x r 
orthogonal matrices and similarly for V, although U and 
V may be dependent on each other. 
This is perhaps the most generic possible random model 
for the singular vectors of a matrix. Under this model for 
values of the rank r greater than log n (to avoid small 
sample effects) p-o = 0(1) and pi = O(logn) with 
very large probability [4]. A way to interpret this is that 
'most' matrices have small values of /io,Mi- 

With the variables /io and p\ defined, along with 
the random orthogonal model, the reader is equipped 
to evaluate the theoretical results of [4] in Table U 
One sees that for 'most' matrices, or alternatively, for 
incoherent matrices (those with small values of po,px), 
it is required that m > n 12 r or m > n 125 r (depending 
on r), ignoring log and constant factors. While these 
results show that one can drastically undersample a 
matrix when r <C n, they are above the theoretical limit 
of (2n — r)r m nr by a factor of about n 2 or n 25 . With 



the aid of some slightly stronger assumptions on M, [6] 
removes these extra small powers of n and nearly attains 
the theoretical limit. 

In order to present these optimal results [6] that 
apply for values of the rank r greater than 0(1), the 
authors introduce the strong incoherence property with 
parameter p, which we now state: it is required that for 
all pairs (a, a') and (b, b') with 1 < a, a', b, b' < n, 



(e a ,Pue a >) 


r 

la— a' 

n 


< 


M — 

n 


(e b ,P v e b >) 


r 

lb=6' 

n 


< 


M — 

n 



Secondly, it is required that \x > Ml (with pi defined 
above). As in [4], the random orthogonal model obeys 
M ^ O(logn) with high probability [6]. Examining 
Table H] one sees that for p = O(logn), the number 
of measurements required is within a polylogarithmic 
factor of the theoretical low limit. 

Is the polylogarithmic factor necessary in the bounds 
above? This answer depends on the size of r. As argued 
in [4], [6, Theorem 1.7], when r = 0(1) it is generally 
impossible to recover M by any algorithm if one does 
not oversample the degrees of freedom by at least a 
factor of logn. However, as shown in [16], when r is of 
the same order as n and M is drawn from the random 
orthogonal model, one can oversample the degrees of 
freedom by a constant factor (while still undersampling 
M), and still have exact recovery with high probability. 

B. OPTSPACE 

We now turn to the algorithm OPTSPACE proposed 
in [11], [12]. This algorithm has three steps, as (roughly) 
described below. 

(1) Remove the columns and rows that contain a dis- 
proportionate amount of sampled entries (trimming) 
in order to prevent these measurements from overly 
influencing the singular vectors in the next step. 

(2) Project the result of step 1 onto the space of rank r 
matrices and renormalize in order to attain an initial 
approximation of M. Q 

(3) Perform local minimization via gradient descent 
over a locally convex, but globally nonconvex, 
function F(-) described in [11], [12], which has M 
as a local minimum. 

The intuitive idea of the algorithm is that the first 2 
steps provide an accurate initial guess for M and that 
the function F(-) behaves like a parabola near M (with 

'it is assumed that r is known in this step. The authors of [11], [12] 
suggest to estimate r using the trimmed matrix from step 1 , or to test 
different values of r. 
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M achieving the minimum of the parabola) and thus 
gradient descent will recover M. 

The success of OPTSPACE is theoretically tied to the 
values of the parameters k, po and [i^- The last has been 
introduced while the first is the condition number 

K = <Tl/(T r . 

The parameter [12 is somewhat analogous to p,\ above. 
In fact, [11], [12] require 

— UiVi I l,oo < p 2 

i=l r 

In the special case where the singular values of M are 
all equal so that k = 1, pi and pi have equivalent 
definitions, compelling the intuition that when k = 0(1) 
the two parameters are comparable. In this setting, and 
if r = O(logn), [11] poses strong theoretical results, 
comparable to those of [6], but with smaller powers of 
the parameters involved and the logarithms. However, 
the applicability of the theory depends strongly on the 
assumption that n is small, whereas when using nuclear- 
norm minimization, the variations in the nonzero sin- 
gular values are inconsequential to the exact recovery 
results. 

C. Noisy matrix completion 

As explained above, there is always a rank-1 matrix 
in the null space of the operator sampling the entries, 
and thus the RIP does not hold. To understand the 
difficulty this creates, consider that in the related field of 
compressive sensing, 'RIPless' error bounds have proved 
extremely elusive. To the authors' best knowledge, there 
is only one paper with such results [3], but it requires that 
every element of the signal should stand above the noise 
level. Despite this difficulty, two recent papers [7], [11] 
prove that matrix completion is robust vis-a-vis noise 
(using nuclear-norm minimization in [7] and OPTSPACE 
in [11]). In order to state these results, we first specify 
the noisy matrix completion problem. 

The noisy model assumes 

Yi^Mij+Zij, (i,j)en, (III.4) 

where {Zij : € fi} is a noise term and, as before, 
fl is chosen uniformly at random with |S!| = m. Another 
way to express this model is as 

Vu{Y)=Vu{M)+Vu{Z), 

for some noise matrix Z (the entries of Z outside of £1 
are irrelevant). 



D. Stability with nuclear-norm minimization 

The recovery algorithm analyzed in [7] is a relative of 
the Dantzig Selector, and once again draws its roots from 
an analogous algorithm in compressive sensing, this time 
the Lasso: 

minimize ||X||* 

subject to \\Vn(X) - Vn(M)\\ l2 < S. (m -*> 

This time, 5 should be larger than the Frobenius norm of 
the noise, i.e. 6 > \\Vci(Z)\\f — at l east stochastically]! 
Thus, the algorithm just minimizes the proxy for the 
rank, while keeping within the noise level. 

The claim in [7] is that as soon as noiseless matrix 
completion is possible via nuclear-norm minimization, 
so is stable matrix completion (this argument is made 
in detail in [7]). We distill this result into the following 
simple theorem: 

Theorem 8: [7] Suppose that any of the requirements 
in [4] or [6] for exact matrix completion in the noiseless 
case are met (see Table|I]i. Suppose HP^^Hf < S. Let 
p = m/n 2 . Then the solution to (ILLL.5I I. M, obeys 



\\M-M\\ F <A^-^-5 + 25, C p = 2+p, (111.6) 

with probability at least 1 — cn~ 3 for a fixed numerical 
constant c. 

While this result is noteworthy in that it has no 
current analogue in compressive sensing, it falls short of 
achieving oracle type error bounds. As described in [7] 
an oracle error bound derived by giving away the column 
space of M in the noisy matrix completion problem is 

||Af 0rade - M\\ F ^p-^ 2 S 

(this oracle error is focused on adversarial noise). One 
sees that the oracle error is over-estimated by a factor of 
about y/n. 

E. Stability with OPTSPACE 

Another recent and noteworthy theoretical error bound 
for noisy matrix completion appears in a paper by 
Montanari et al. [11]. Once again the OPTSPACE al- 
gorithm is used, and thus having a large spread in the 
singular values of M can cause instabilities. However, 
as described in the following theorem, under suitable 
conditions the error bounds are comparable to those 
achievable with the aid of an oracle (with stochastic 
noise). 

2 For example, if the entries of Z are iid ./V(0,o- 2 ), one may take 
S 2 = (m + \/8m)a 2 . 

3 The authors are in the process of writing an analogous paper for 
the compressive sensing case. 
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Theorem 9: [11] Suppose rank(M) = r and 

to > Cnn 2 max(/tor logn, ^r 2 n 2 , fi\r 2 K A ) 

for a fixed numerical constant C. Let M be the solution 
to the OPTSPACE algorithm. Then 

\\M-M\\ F <C' K 2l ^\\P Q (Z)\\ 

TO 

with probability at least 1 — 1/ n 3 , assuming that the RHS 
is smaller than oy, for a fixed numerical constant C". 
Here <7 r is the smallest nonzero singular value of M. 

When Z contains iid Gaussian entries with variance 
a 2 , the term HT-V^Z^H can be bounded as 

lin^)|| <c(^) V2 . 

with high probability (see [11]). Thus, in the regime 
when k = 0(1) and a r > C K 2 ^^\\Tn{Z)\\, one has 

\\M-M\\ 2 F <C n3rl ° gn a 2 

771 

which is within a logarithmic factor of a simple oracle 
bound discussed in [7], in which the exact column space 
is given away and the noise is assumed to be stochastic. 
Specifically, this is the oracle bound that one achieves 
by examining the expected error of the estimator M[U] 
defined in equation ( III. 8t . where U is defined as in the 
SVD M = UZV*. 

However, the class of low-rank matrices to which the 
theorem applies is very restrictive, a problem that is non- 
existent when the RIP holds. In order to see this, note 
first that it is required that all of the singular values of M 
stand far above the noise level. For example, if one sees 
the entire matrix (m = n 2 ) then the theorem requires 
ay > C K 2 y/r\\Z\\, i.e. the minimal singular value of M 
must be larger than the noise level by a factor of about 
n 2 ^/r. Secondly, the number of measurements required 
is at least Ck^ia^t 2 and thus quickly grows much larger 
than the degrees of freedom of M when k and r grow. 

IV. Conclusion 

We have shown that a nuclear-norm minimization 
algorithm flH.4t recovers a low-rank matrix from the 
noisy data (Ai, M)+Zi, i = 1, . . . , m, in which each A; 
is Gaussian (or sub-Gaussian), and enjoys the following 
properties: 

1) For both exact recovery from noiseless data and 
accurate recovery from noisy data, the number of 
measurements to must only exceed the number of 
degrees of freedom by a constant factor. 

2) With high probability the error bound is within a 
constant factor of the expected minimax error. 



3) With high probability the error bound achieves an 
optimal bias-variance trade-off (up to a constant). 

4) The error bounds extend to the case when M has 
full rank (with many 'small' singular values). 

We close this paper with a few questions that we 
leave open for future research. Can the 'RIPless' the- 
oretical guarantees be improved? In particular, in the 
case of nuclear-norm minimization based algorithms, can 
the error bound be tightened? And for other tractable 
algorithms, can we achieve strong error bounds without 
requiring the nonzero singular values of M to be nearly 
constant? Finally, are there useful applications in which 
the measurements are 'incoherent' enough that the RIP 
provably holds? 
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